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Abstract 

Combining the Hardy-Littlewood /c-tuple conjecture with a heuristic application of 
extreme- value statistics, we propose a family of estimator formulas for predicting max- 
imal gaps between prime fc-tuples. Extensive computations show that the estimator 
alog(a;/a) — ba satisfactorily predicts the maximal gaps below x, in most cases within 
an error of ib2a, where a = Ck log^ x is the expected average gap between the same 
type of /c-tuples. Heuristics suggest that maximal gaps between prime /c-tuples near x 
are asymptotically equal to alog(a;/a), and thus have the order 0(log'^'''^ x). The dis- 
tribution of maximal gaps around the "trend" curve alog(a:;/a) is close to the Gumbel 
distribution. We explore two implications of this model of gaps: record gaps between 
primes and Legendre-type conjectures for prime fc-tuples. 

1 Introduction 

Gaps between consecutive primes have been extensively studied. The prime number theorem 
[T5| p. 10] suggests that "typical" prime gaps near p have the size about log p. On the other 
hand, maximal prime gaps grow no faster than Olp^'^"^^) [151 P- 13]. Cramer |1] conjectured 
that gaps between consecutive primes Pn — Pn~i are at most about as large as log^p, that is, 
limsup(pn — pn~i) / log^ Pn = 1 wheu Pn — )■ oo. Moreover, Shanks [26] stated that maximal 
prime gaps G{p) satisfy the asymptotic equality -\/ G{p) ~ log p. All maximal gaps between 
primes are now known, up to low 19-digit primes (OEIS A005250) [27], [21]. This data 
apparently supports the Cramer and Shanks conjecture^: thus far, if we divide by log^p the 
maximal gap ending at p, the resulting ratio is always less than one — but tends to grow 
closer to one, albeit very slowly and irregularly. 



-'^ While the Shanks conjecture ^y G(p) ~ logp is plausible, the "inverted" Shanks conjecture p ^ g^y^ip) jg 
likely false. (In general, X ^ ^ \ for example, x + \ogx ~ x, but e^"'"'°s^= r^^x as a; — oo). 

Wolf [29l p. 21] proposes an improvement: a gap G{p) is likely to appear near p ~ y^G{p)e^^^P\ 
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Less is known about maximal gaps between prime constellations, or prime fc-tuples. One 
can conjecture that average gaps between prime fc-tuples near p are 0{\og^p) as p — >■ oo, 
in agreement with the Hardy-Littlewood /c-tuple conjecture [H]. Kelly and Pilling |16j . 
Fischer [5] and Wolf [30] report heuristics and computations for gaps between twin primes 
[k = 2). Kelly and Pilling [T7] also provide physically-inspired heuristics for prime triplets 
{k = 3); Fischer [6] conjectures formulas for maximal gaps between /c-tuples for both k = 2 
and k = 3. All of these conjectures and heuristics, as well as extensive computations, suggest 
that maximal gaps between prime /c-tuples are at most about logp times the average gap, 
which implies that maximal gaps are 0{\og^~^^ p) as p ^ oo. 

In this article we use extreme value statistics to derive a general formula predicting the 
size of record gaps between /c-tuples below p: maximal gaps are approximately a log(p/ a) —ba, 
with probable error 0{a). Here a = Cklog^p is the expected average gap near p, and Ck 
and b are parameters depending on the type of /c-tuple. This formula approximates maximal 
gaps better and in a wider range than a linear function of log'^'*'^ p. We will mainly focus on 
three types of prime /c-tuples: 

• k = 2: twin primes (maximal gaps are OEIS Al 13274); 

• k = A: prime quadruplets (maximal gaps are OEIS A113404); 

• k = 6: prime sextuplets (maximal gaps are OEIS A200503). 

The observations can be readily applied to other /c-tuples; however, numerical values of 
constants Ck will change depending on the specific type of /c-tuple. See, e.g., the following 
OEIS sequences for data on maximal gaps between prime /c-tuples for other k: 

• k = 3: prime triplets (maximal gaps are A201596 and A201598); 

• k = 5: prime quintuplets (maximal gaps are A201073 and A201062); 

• k = 7: prime septuplets (maximal gaps are A201051 and A201251); 

• k = 10: prime decuplets (maximal gaps are A202281 and A202361). 

2 Definitions, notations, examples 

Twin primes are pairs of consecutive primes that have the form {p, p+2}. (This is the densest 
repeatable pattern of two primes.) Prime quadruplets are clusters of four consecutive primes 
of the form {p, p + 2, p + 6, p + 8} (densest repeatable pattern of four primes). Prime 
sextuplets are clusters of six consecutive primes of the form {p, p + 4, p + 6, p + 10, p + 12, 
p+ 16} (densest repeatable pattern of six primes). 

Prime k -tuples are clusters of k consecutive primes that have a repeatable pattern. Thus, 
twin primes are a specific type of prime /c-tuples, with k = 2; prime quadruplets are another 
specific type of prime fc-tuples, with k = 4; and prime sextuplets are yet another type of 
prime fc-tuples, with k = 6. (The densest /c-tuples possible for a given k may also be called 
prime constellations or prime k-tuplets.) 
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Gaps between prime fc-tuples are center-to-center distances between consecutive /c-tuples of 
the same type. If the prime at the end of the gap is we denote the gap Qkip)- For example, 
the gap between the quadruplets {11, 13, 17, 19} and {101, 103, 107, 109} is ^4(101) = 90. 
The gap between the twin primes {17, 19} and {29, 31} is (72(29) = 12. (Hereafter p always 
denotes a prime. In the context of gaps between prime fc-tuples, p will refer to the end-of-gap 
prime. Note that the start-of-gap prime might be orders of magnitude smaller than the gap 
size itself; e.g., the gap ^6(16057) = 15960 starts at {97,101,103,107,109,113}; the gap 
^6(1091257) = 1047480 starts at {43777,43781,43783,43787,43789,43793}.) 
A maximal gap is a gap that is strictly greater than all preceding gaps. In other words, 
a maximal gap is the first occurrence of a gap at least this size. As an example, con- 
sider gaps between prime quadruplets (4-tuples): the gap of 90 preceding the quadruplet 
{101, 103, 107, 109} is a maximal gap (i.e. the first occurrence of a gap of at least 90), while 
the gap of 90 preceding {191, 193, 197, 199} is not a maximal gap (not the first occurrence of 
a gap at least this size). A synonym for maximal gap is record gap. By Gk{x) we will denote 
the largest gap between /c-tuples below x. {Note: Statements like this will always refer to a 
specific type of k -tuples.) We readily see that 

gk{p) < Gkip) wherever guip) is defined, and 
9k{p) = Gk{p) if gk{p) is a maximal gap. 

In rare cases, the equality gk{p) = Gk{p) may also hold for non- maximal gaps gk{p)] e.g., 

(74(191) = G4(191) = 90 even though the gap (74(191) is not maximal. 

The average gap between /c-tuples near x is denoted gk{x) and defined here as 

9k{p) 

— -— ^x<p~gkip) <p<^x the sum of all gaps between /c-tuples with p G [^x, fx] 

gf^(x) = = — 

1 total count of gaps between /c-tuples with p G |x] 

lx<p-gk{p) <p< fx 

(The value of gk{x) is undefined if there are less than two /c-tuples with \x < p < |x.) 
The expected average gap between /c-tuples near x (for any x > 3) is defined formally as 
a = a{x) = Cfclog'^x, where the positive coefficient Gk is determined by the type of the 
/c-tuples. (See the Conjectures section for further details on this.) 

3 Motivation: is a simple linear fit for Gkijp) adequate? 

The first ten or so terms in sequences of record gaps (e.g., A113274, A113404, A200503) 
seem to indicate that maximal gaps between /c-tuples below p grow about as fast as a linear 
function of \og'^^^ p. For twin primes (/c = 2), Rodriguez and Rivera [21] gave simple linear 
approximations of record gaps, while Fischer [6] and Wolf [30] proposed more sophisticated 
non-linear formulas. Why bother with any non-linearity at all? Let us look at the data. 
Table 1 presents the best-fit lines for record gaps between /c-tuples below 10^^, as well as the 
coefficients of determination {R^) quantifying the goodness of fit. 



3 



TABLE 1 

Best-fit lines and coefficients for maximal gaps between prime /c-tuples {k = 2, 4, 6). 

Trendline equations for maximal gaps between prime /c-tuples: 
End-of-gap prime p twin primes prime quadruplets prime sextuplets 
(fc = 2; e = log'p) (fe = 4; e = log'p) {k = Q; ^ = hg' p) 

1 < j9 < 10^ 0.4576^ = 0.947) 0.0627^ (i?^ = 0.974) 0.0016^ (i?^ = 0.963) 

10^ <p< 10^ 0.4756C = 0.904) 0.1031^ {R^ = 0.926) 0.0147^ {R^ = 0.671) 

109<j9<10i2 0.5203^ = 0.944) 0.1245^ (i?^ = 0.805) 0.0181^ (i?^ = 0.952) 

10^^ <p< 10^^ 0.5628^ {R^ = 0.974) 0.1451^ {R^ = 0.962) 0.0249^ {R^ = 0.960) 

Table 1 shows that, for a fixed k, record gaps between /arg'er /c-tuples have a steeper trendline 
(when plotted against \og^~^^ p). This is not a "one-slope- fits-all" situation! There is a good 
reason to expect that the same tendency holds in general for any k: As we will see in the 
next sections, there exist curves that predict the record gap sizes, on average, better than 
any linear function of \og^~^^ p — and the farther from zero, the steeper are these curves 
(approaching certain limit values of slope, Ck)- Nevertheless, a linear approximation can 
also be useful; computations and heuristics suggest that a linear function of log'^'''^p can 
serve as a convenient upper bound for gaps. For example: Maximal gaps between twin primes 
are less than 0.76 log'^ p. In what follows, we will combine the Hardy-Littlewood /c-tuple 
conjecture with extreme value statistics to better predict the sizes of maximal gaps between 
prime /c-tuples of any given type, accounting for their non-linear growth trend. 



4 Conjectures 

In this section we state several conjectures based on plausible heuristics and supported 
by extensive computations. As far as rigorous proofs are concerned, we do not even know 
whether there are infinitely many /c-tuples of a given type — e. g., whether there are infinitely 
many twin primes for k = 2. (The famous twin prime conjecture thus far remains unproven. 
A fortiori there is no known proof of the more general /c-tuple conjecture described below.) 



4.1 The Hardy-Littlewood /c-tuple conjecture 

The Hardy-Littlewood /c-tuple conjecture [2], [23 PP- 60-68] predicts the approximate total 
counts of prime /c-tuples (with a given admissible! pattern): 

/"^ dt 

The total number of prime /c-tuples below x ~ if / — r—. 

J 2 log'^t 

The actual counts of /c-tuples match this prediction with a surprising accuracy [231 P-62]. 
The coefficients Hk are called the Hardy-Littlewood constants. Note that, in general, the 

^Any pattern of k primes is deemed admissible (repeatable) unless it is prohibited by divisibility consid- 
erations. For instance, the pattern of {p, p + 2, p + 4} is prohibited: one of the numbers p, p + 2, p + 4 must 
be divisible by 3. But {p, p + 2, p + 6} is not prohibited, hence admissible. For a more detailed discussion 
of admissible patterns, see [221 pp. 62-63]. 
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constants depend on k and on the specific type of fc-tuple (e. g., there are three types of 
prime octuplets, with two different constants). Hardy and Littlewood not only conjectured 
the above integral formula but also provided a recipe for computing the constants Hk as 
products over subsets of primes. For example, in special cases with /c = 2, 4, 6 we have 

Ha 
He 

These formulas for Hk have slow convergence. Riesel [23] and Cohen [3] describe efficient 
methods for computing Hk with a high precision. Forbes [8] provides the values of Hk for 
dense fc-tuples, or k-tuplets, up to /c = 24. The fc-tuple conjecture implies that 

• The sequence of maximal gaps between prime fc-tuples of any given type is infinite. 
(Thus, all OEIS sequences mentioned in Introduction are infinite.) 

• When X — oo, the largest gaps below x will grow (asymptotically) at least as fast as 
average gaps, i.e., as fast as O(log^x) or faster. 

But exactly how much faster? Conjectures (D) and (E) below give plausible answers. 

4.2 Conjectured asymptotics for gaps between /c-tuples 

Let Ck denote the reciprocal to the corresponding Hardy-Littlewood constant: Ck = H^^. 
The following formulas provide rough estimates of the gap gk{p) ending at a prime p: 

(A) Average gaps between prime /c-tuples near p are gk{p) ~ C'^log'^p. 

(B) Maxima/ gaps between prime /c-tuples are 0(log'^"^^p): 

gkip) < Mk log*^+^ p, where Mk ^ Ck (and possibly Mk = Ck). 

Defining the expected average gap near x to be a = log^ x (x > 3), we further conjecture: 

(C) Maximal gaps below x are asymptotically equal to Ck log^'*'^ x: 

Ck{x) ~ Cfclog'^'^^x as X — oo, with probable error 0(a log a). 

(D) Maximal gaps below x are more accurately described by this asymptotic equality: 

Gfc(x) ~ alog(x/a) as X — )■ oo, with probable error 0(a). 

(E) For any given type of /c-tuple, there exists a real b (e.g., ^ ~ f ) such that the difference 
Gk{x) — a(log(x/a) — h) changes its sign infinitely ofteio as x — )■ oo. 

^ Moreover, on finite intervals x £ [3,Xjnax] the difference Gk{x) — a{\og{x/a) — b) changes its sign more 
often than Gk{x) — LiXo^'^^ x) , where L{\og^^^ x) is any linear function of log'^^^a: and X^ax is large enough. 
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2TTt^^^ — TT ~ 1.32032 (for twin primes), 
» — 1 r 
p>3 ' 

27 p^ip 4) 

-n V- ^4 ^ 4.15118 (for prime quadruplets), 

p>5 ' 
jy^ (j) (3) 

~r7 IT ^ ^ ~ 17.2986 (for prime sextuplets). 

J- J- (p — l)b 



A key ingredient in these conjectures is provided by the constants Ck = Hk : 

C2 = ^ 0.75739, C4 = H^^ ^ 0.240895, Cg = H^^ ^ 0.057808. 

Another key ingredient is a statistical formula: for certain kinds of random events occurring 
at mean intervals a, the record interval between events observed in time T is likeljH near 
alog(T/a). In Appendix we derive this formula for a = const. Here, we heuristically apply 
this formula for a slowly changing a (i.e., a = C^log^x). For now, we can informally 
summarize the behavior of maximal gaps between /c-tuples near p as follows: Maximal gaps 
are at most about logp times the average gap. 



4.3 Estimators for maximal gaps between /c-tuples 

Prime fc-tuples are rare and seemingly "random". Life offers many examples of unusually 
large intervals between rare random events, such as the longest runs of dice rolls without 
getting a twelve; maximal intervals between clicks of a Geiger counter measuring very low 
radioactivity, etc. Reasoning as in Appendix, one can statistically estimate the mathematical 
expectation of maximal intervals between rare random events by expressing them in terms 
of the average intervals: 

Expected maximal intervals = alog(T/a) + 0(a), (*) 

where a is the average interval between the rare events, and T is the total observation time 
or length (1 < a < T). 

To account for the observed non-linear growth of record gaps between prime /c-tuples 
(Table 1), we will simulate gap sizes using estimator formulas very similar to the above (*). 
We define a family of estimators for the maximal gap that ends at p: 

EiiGkip)) = max(a, alog(p/a) — 6a), probable error: 0(a); (1) 
E2{Gk{p)) = max(a, alog(p/a)), probable error: 0(a); (2) 

EsiGkip)) = alogp = Ofc log'^"'"^ p, probable error: O(aloga). (3) 

Here, the role of the statistically average interval a is played by the expected average gap 
between fc-tuples: as before, we set a = O^log'^p. The role of the total observation time T 
is played by p (we are "observing" gaps that occur from to p). We also empirically choose 
6 = |. (The latter choice is not set in stone; by varying the parameter b in Ei one can get 
an infinite family of useful estimators with similar asymptotics. In Section 6 we will see that 
6^3 appears quite suitable for modeling prime gaps, in which case k = 1, a = logp, and 
Oi = 1.) It is easy to see that, for any fixed k > 1 and any fixed 6 > 0, we have 

a El < E2 < E3 for all p > 3, but at the same time a ^ -Ei ~ £'2 ~ -E3 as p — ?■ 00. 

In particular, if intervals between rare random events have the exponential distribution, with mean 
interval a sec and CDF 1 — e"*/", then the most probable record interval observed within T sec is about 
a\og{T/a) sec (provided that a ^ T). After many observations ending at times a ^ Ti ^ T2 <C T3 . . . 
almost surely for some Ti we will observe record intervals exceeding a\og{Ti/a). However, for other values 
of Ti we will also observe record intervals below a \og{Ti /a). It is this formula for the most probable extreme, 
with the aid of the estimate SD = 0{a) for the standard deviation of extremes, that allows us to heuristically 
predict the bounds, errors, asymptotics, and sign changes in conjectures (B), (C), (D), (E). 
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Indeed, when p — > oo we have 



Ei{Gk{p)) = a\og{p/a) - ah = alogp - a(loga + h) = alogp - o(log ~ E^{Gk{p))- 

Note: We use the max function in the estimators to guarantee that Ei{Gk{p)) > a- This 
precaution is needed because, if p is not large enough, log(p/a) might be negative or too 
small. We want our estimators to give positive predictions no less than a even in such cases. 

The above conjectures (C), (D), (E) tell us that Ei and E2 and are better estimators than 
£■3: the probable error of E^ is greater than that of Ei or E2. In Section 5.1 we will compare 
the predictions obtained with these estimators to the actual sizes of maximal gaps. 

4.4 Why extreme value statistics? 

In number theory, probabilistic models such as Cramer's model |1] face serious difficulties. 
One such difficulty will be noted in Section 6. Pintz [22] points out additional problems 
with Cramer's model. Number-theoretic objects (such as primes or prime /c-tuples) are too 
peculiar; they are clearly not independent and cannot be flawlessly simulated by independent 
and identically distributed (i.i.d.) random variables or "events" or "coin tosses" that we 
usually deal with in probabilistic models. Why then should one build heuristics for prime 
/c-tuples based on extreme value statistics? 

An obvious reason is that we are studying extreme gaps, so it would be unwise to outright 
dismiss the existing extreme value theory without giving it a try. When our goal is just to 
guess the right formula, rigor is not the highest priority; it is perhaps more important to 
accumulate as much evidence as possible, look for counterexamples, and make reasonable 
simplifications. The above formula for the expected maximal interval (*) appears to be at 
the right level of simplification and fits the actual record gaps fairly well even without the 
0{a) term (as we will see in Section 5.1). To fine-tune formula (*) for record gaps between 
prime fc-tuples, we simply have to find a suitable 0{a) term. The latter can be done using 
number-theoretic insights and/or numerical evidence. 

Extreme value theory also offers additional benefits. Not only does it tell us the mathe- 
matical expectation of extremes in random sequences — it also predicts distributions of ex- 
tremes. While in general there are infinitely many probability distribution laws, there exist 
only three types of limiting extreme value distributions applicable to sequences of i.i.d. random 
variables: the Gumbel, Frechet, and WeibuU distributions [1]. When no limiting extreme 
value distribution exists, a known type of extreme value distribution may still be a good 
approximation [11], [25]. A large body of knowledge has been accumulated that extends the 
same types of extreme value distributions from i.i.d. random variables to certain kinds of de- 
pendent variables, for example, m-dependent random variables [28], exchangeable variables 
[2], [9], pp. 163-191], and other situations [1], [9]. Although no theorem currently extends the 
known types of extreme value distributions to record gaps between primes or prime fc-tuples, 
we might have an aesthetic expectation that "the usual suspects" would show up here, too. It 
turns out that one common type of extreme value distribution — the Gumbel distribution — 
does show up! (See Section 5.2, The distribution of maximal gaps.) 
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5 Numerical results 



Let us look at the actual sequences of record gaps obtained from lengthy computations, and 
compare them to gap sizes predicted by the heuristic estimators Ei, E2, E3 defined above. 

5.1 The growth of maximal gaps 

Figure 1 shows record gaps between twin primes (Al 13274) for p < 10^^; the curves are 
predictions obtained with estimators Ei, E2, E^. Figure 2 shows similar data for prime 
quadruplets (A113404), and Figure 3 for prime sextuplets (A200503). Tables 2-4 give the 
relevant numerical data; see also OEIS sequences mentioned in Introduction. 
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5000 10000 15000 20000 25000 30000 35000 40000 

(log p)A3 

Figure 1: Maximal gaps between twin primes {p,p + 2} (A113274). Plotted (bottom to top): 
expected average gap a = 0.75739 log^p, estimators Ei = a\og{p/a) — ha, E2 = a\og{p/a), 
E3 = alogp = 0.75739 log'^p, where p is the end-of-gap prime; 6 = 1. 

Here are some observations based on these numerical results. (As before, a denotes the 
expected average gap, a = Cklog^ p, and = | unless stated otherwise.) 

1. Estimators Ei and E2 overestimate some of the actual record gaps, but underestimate 
others. For k < 6, the data shows that Ei is closer to a median-unbiased estimator]^ 
(We can make it even closer by tweaking the b value; e.g., setting b ~ 1.2597 for twin 
primes, or 6 0.7497 for prime quadruplets, would turn Ei into a median-unbiased 
estimator for maximal gaps below 10^^.) 

median-unbiased estimator En^cdix) has as many observed values above it as below it. 
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2. About 90% of the observed gaps are within ±2a of the Ei curve. Over 50% of the 
observed gaps are within ±a of Ei. This level of accuracy appears to be in line with 
heuristics based on statistical models (where the relevant extreme-value distributions 
have the standard deviation ira/^/G ~ 1.28a; see Appendix). 

3. Consider median-unbiased estimators E^^^{Gk{p)) = a(\og{p/a) — b^^^) for p < 10^^. 
Computations show that the value of b^^^ tends to decrease when k increases; also, our 
empirical value 6 = | in i?i is a little above the median- unbiased value b^^^. (For a 
simple way to refine b, see remark at the end of sect. 5.2.) 

4. For relatively small values of p that we deal with, the estimator E^ may seem useless 
(too far above the realistic values). However, all three estimators are asymptotically 
equivalent, E'l ~ i?2 ~ -E's when p —> oo. 

5. The estimator E^ = log^"^^ p overestimates all known record gaps. In most cases, the 
error of E^ is close to a log a, exactly as expected from extreme- value statistics. Thus 
i?3 may be a good candidate for an upper hound for all record gaps; so in statement 
(B) of section 4.2 we may have = Ck = H^^, and 

Gk{p) < C'fclog^^^p (an analog of Cramer's conjecture). 

It would be interesting to see any counterexamples, i.e., gaps exceeding log'^'*'^ p. 
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O.OE+OO 1.0E+07 2.0E+07 3.0E+07 4.0E+07 5.0E+07 

(log p)«5 

Figure 2: Maximal gaps between prime quadruplets {p,p+2,p+6,p+8} (A113404). Plotted 
(bottom to top): expected average gap a = 0.24089 log^^p, estimators Ei = alog{p/a) — ba, 
E2 = alog(p/a), i?3 = alogp = 0.24089 log^p, where p is the end-of-gap prime; b = 1/2. 
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Absolute error. The absolute error \Ei — Gk{p)\ tends to grow (but not monotonically) as 
p — oo for all three estimators Ei, E2, E^. Heuristically, we expect the absolute error to 
be unbounded and, on average, continue to grow for all three estimators. Probable absolute 
errors are 0(a) for Ei and i?2, and 0(a log a) for E^. 

Relative error. The relative error \ei\ = \Ei — Gk{p)\/Gk{p) tends to decrease (but not 
monotonically) for all three estimators as p — )■ 00. It may not be obvious from Figures [iHHl 
but we must have \ei\ — ?■ either for all three estimators or for none of them. {Note: the 
limit of \ei\ as p — t- 00 might not exist at all.) 

Error in average-gap units a. The error {E^ — Gk{p))/a, i.e., the E^ error expressed as 
a number of expected average gaps, grows about as fast as log a (but not monotonically). 
Judging from limited numerical data, the corresponding error [Ei — G k{p)) / a seems bounded 
as p — )■ 00 if we use estimators Ei or E2. Heuristically, for Ei and E2 this error should remain 
bounded for the majority (but not all) of the record gaps. 

1.6E+09 -1 1 




O.OE+00 1.0E+10 2.0E+10 3.0E+10 4.0E+10 5.0E+10 

(log p)'^7 

Figure 3: Maximal gaps between prime sextuplets {p, p + 4, p + 6, p + 10, p + 12, p + 16} 
(A200503). Plotted (bottom to top): expected average gap a = 0.057808 log^p, estimators 
El = alog{p/a) — ba, E2 = alog(p/a), E3 = alogp = 0.057808 log^p, where p is the end-of- 
gap prime; b = 1/3. 

Overall, the prediction that record gaps are about alog(p/a) + 0{a) appears correct for the 
vast majority of actual gaps, as far as we have checked {p < 10^^). Note that the "optimal" 
0{a) term {—ba in the Ei estimator) is negative, at least for k < 6. For larger values of k, 
the parameter b gets closer to zero. Empirically, for /c-tuples with k > 6, the Ei estimator 
will likely produce good results even with b ^ 0. Therefore, for large k we might want to 
simplify the model and use b = 0, i. e., use the estimator E2 = max(a, a log(p/a)), the dotted 
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curve in the above figure. However, maximal gap estimators with certain special properties 
(e.g., median- unbiased estimators) will still require nonzero values of b. 
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Maximal gaps between twin primes {p, p + 2} below 10 
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65095731749 


65095739789 


8040 


-1.625 


311 


347 


36 


-1.205 


134037421667 


134037430661 


8994 


-1.323 


347 


419 


72 


-0.112 


198311685749 


1 0001 1 r\ f f\ r* 1 

198311695061 


9312 


-1.604 


659 


809 


150 


1.247 


223093059731 


223093069049 


9318 


-1.865 


2381 


2549 


168 


-0.396 


353503437239 


or'or'oo A A'~7 A 

353503447439 


10200 


-1.262 


5879 


6089 


210 


-1.011 


484797803249 


/I o /I '^rA'^o 1 o r" o '7 

484797813587 


10338 


-1.747 


13397 


13679 


282 


-1.189 


638432376191 


638432386859 


10668 


-1.792 


18539 


18911 


372 


-0.486 


784468515221 


784468525931 


10710 


-2.195 


24419 


24917 


498 


0.645 


794623899269 


'7n/i/^f~iOO'i f\r'T' ^ 

794623910657 


11388 


-1.032 


62297 


62927 


630 


0.290 


1246446371789 


1 O A A AC^Ci^'l^l'X 

1246446383771 


11982 


-1.081 


187907 


188831 


924 


0.834 


1344856591289 


1344856603427 


12138 


-0.998 


687521 


688451 


930 


-1.728 


1496875686461 


1496875698749 


12288 


-1.002 


688451 


689459 


1008 


-1.161 


2156652267611 


2156652280241 


12630 


-1.309 


850349 


851801 


1452 


1.577 


2435613754109 


c\ A i^T' r* -\ o'~7/^'^i f r\ 

2435613767159 


13050 


-0.916 


2868959 


2870471 


1512 


-0.721 


4491437003327 


4491437017589 


14262 


-0.481 


4869911 


4871441 


1530 


-1.689 


13104143169251 


-io-i/^^i ^O-IOO/^O'"? 

13104143183687 


14436 


-2.773 


9923987 


9925709 


1722 


-2.070 


14437327538267 


-1 A ^lo'^oo'^r'r'o*"*'! f\ 

14437327553219 


14952 


-2.255 


14656517 

V/ vy _i_ i 


14658419 


1902 


-1.948 


18306891187511 


1 oor\/^oni o/~»or\r\^ 

18306891202907 


15396 


-2.181 


17382479 


17384669 


2190 


-0.918 


18853633225211 


1 oor'o/^ooo/ir\r\oi 

18853633240931 


15720 


-1.793 


30752231 


30754487 


2256 


-1.805 


ZoZ ( 040 ( DD4oyy 


ZoZ / 045 ( DoiZDi 


16362 


-1.398 


32822369 


32825201 


2832 


0.601 


23634280586867 


23634280603289 


16422 


-1.351 


96894041 


96896909 


2868 


-1.646 


38533601837027 


38533601847617 


16590 


-2.291 


136283429 


136286441 


3012 


-1.812 


43697538391391 


43697538408287 


16896 


-2.178 


234966929 


234970031 


3102 


-2.611 


56484333976919 


56484333994001 


17082 


-2.539 


248641037 


248644217 


3180 


-2.451 


74668675816277 


74668675834661 


18384 


-1.507 


255949949 


255953429 


3480 


-1.454 


116741875898981 


116741875918727 


19746 


-0.864 


390817727 


390821531 


3804 


-1.260 


136391104728629 


136391104748621 


19992 


-0.940 


698542487 


698547257 


4770 


0.571 


221346439666109 


221346439686641 


20532 


-1.467 


2466641069 


2466646361 


5292 


-0.816 


353971046703347 


353971046725277 


21930 


-0.955 


4289385521 


4289391551 


6030 


-0.075 


450811253543219 


450811253565767 


22548 


-0.834 


19181736269 


19181742551 


6282 


-2.831 


742914612256169 


742914612279527 


23358 


-1.149 


24215097497 


24215103971 


6474 


-2.888 











In 2008 Fischer [5] computed maximal gaps between twin primes up to 2 x 10^^. For prior 
computations see also OEIS Al 13274, A113275 by Boncompagni et al. [27] (2005). 
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TABLE 3 

Maximal gaps between prime quadruplets {p, p + 2, p + 6, p + 8} below 10 



Initial 


primes: 


9a 


94 


Initial primes: 


94 


n* 
94 


I''' quadruplet 


2"'^ quadruplet 


quadruplet 


2"<i quadruplet 


5 


11 


6 


0.430 


3043111031 


3043668371 


557340 

1 (Jt:w 


-0.750 


11 


101 


90 


0.902 


3593321651 


3593956781 


635130 

W (J t_/ 


0.188 


191 


821 


630 


0.770 


5675642501 


5676488561 


846060 


2.366 


821 


1481 


660 


0.192 


25346635661 


25347516191 


880530 


-1.576 


2081 


3251 


1170 


-0.014 


27329170151 


27330084401 


914250 


-1.358 


3461 


5651 


2190 


0.194 


35643379901 


35644302761 


922860 


-1.966 


5651 


9431 


3780 


0.518 


56390149631 


r'/^o/^i 1 r' 1 

56391153821 


1004190 


-2.244 


25301 


31721 


6420 


-0.125 


60368686121 


60369756611 


1070490 


-1.697 


34841 


43781 


8940 


0.211 


71335575131 


'71 OO/^/^/^OCT /I 1 

71336662541 


1087410 


-1.967 


88811 


97841 


9030 


-0.998 


76427973101 


'7/^ /ior»/^/^/^ /I cri 

76429066451 


1093350 

vy ly KJ w vy 


-2.089 


122201 


135461 


13260 


-0.539 


87995596391 


87996794651 


1198260 


-1.383 


171161 


187631 


16470 


-0.434 


96616771961 


l \ / ■ / ■ 1 01 /^O /1/~\1 

96618108401 


1 ,S36440 


-0.242 


301991 


326141 


24150 


-0.094 


151023350501 


1 r'1 /~»o /( 0/^/^71 

151024686971 


1336470 


-1.535 


739391 


768191 


28800 


-1.004 


164550390671 


i/^/icrri'7or\i 11 

164551739111 


1348440 


-1.663 


1410971 


1440581 


29610 


-1.957 


171577885181 


1 '71 r'^nor'r' /I01 

171579255431 


1370250 


-1.577 


1468631 


1508621 


39990 


-0.977 


210999769991 


01 1 r^/~i 1 o/^r\/^oi 

211001269931 


14QqQ4n 

X ty ty^w 


-0.986 


2990831 


3047411 


56580 


-0.812 


260522319641 


260523870281 


1550640 

X ty ty vjvy^vy 


-1.150 


3741161 


3798071 


56910 


-1.217 


342611795411 


O/lO/^l ylO/1/^1/^1 

342614346161 


2550750 


6.412 


5074871 


5146481 


71610 


-0.714 


1970587668521 


1 n ^/^ r" n/^o /^ 1 1 

1970590230311 


2561790 

ijtyvyx 1 


0.197 


5527001 


5610461 


83460 


-0.049 


r'001 r\ono"i 

4231588103921 


4231591019861 


2915940 


-0.076 


8926451 


9020981 

u\j ^yj ukj ±. 


94530 


-0.379 


5314235268731 


r'oi ,10001 r\o'7^i 

5314238192771 


2924040 


-0.748 


17186591 


17301041 


114450 


-0.678 


7002440794001 


^/^/^O /I ,10^/i/^/^/^1 

7002443749661 


2955660 

^ ty ty w vy vy 


-1.421 


21872441 


22030271 


157830 


0.996 


8547351574961 


or* /i'7or' /i/^o'7/i fi 

8547354997451 


3422490 


0.447 


47615831 


47774891 


159060 


-0.861 


1 CTi 1 /iir\or\or\r>oi 

15114108020021 


1 cri 1 4 1 1 1 /1'7/'"7/11 

15114111476741 


3456720 


-1.200 


66714671 


66885851 


171180 


-1.135 


iuoo ( Doooiooii 


iOoo ( Do ( ZUo4oi 


3884670 


0.533 


76384661 


76562021 


177360 


-1.202 


30709975578251 


30709979806601 


4228350 


0.134 


87607361 


87797861 


190500 


-1.023 


43785651890171 


43785656428091 


4537920 


0.307 


122033201 


122231111 


197910 


-1.515 


47998980412211 


47998985015621 


4603410 


0.278 


132574061 


132842111 


268050 


0.677 


55341128536691 


55341133421591 


4884900 


0.972 


204335771 


204651611 


315840 


1.022 


92944027480721 


92944033332041 


5851320 


2.995 


628246181 


628641701 


395520 


0.099 


412724560672211 


412724567171921 


6499710 


0.021 


1749443741 


1749878981 


435240 


-1.669 


473020890377921 


473020896922661 


6544740 


-0.293 


2115383651 


2115824561 


440910 


-2.020 


885441677887301 


885441684455891 


6568590 


-2.253 


2128346411 


2128859981 


513570 


-0.617 


947465687782631 


947465694532961 


6750330 


-1.932 


2625166541 


2625702551 


536010 


-0.749 


979876637827721 


979876644811451 


6983730 


-1.356 


2932936421 


2933475731 


539310 


-0.982 











In 2005 Boncompagni computed maximal gaps between prime quadruplets up to 8-digit 
primes; see OEIS A113403, A113404 [27]. For /c-tuples with k > 4, no other prior computa- 
tions of maximal gaps are known to the author. 
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TABLE 4 

Maximal gaps between prime 6-tuples {p, p + 4, p + 6, p + 10, p + 12, p + 16} below 10 



Initial 


primes: 


96 


* 


Initial primes: 


56 


96 


1°' sextuplet 


2"<* sextuplet 


1 cp-v'f nr>lpf 


Ond „„„i,.„lpj- 


7 


AT 

97 


A A 

90 


1 C 

1.856 


422088931207 


422248594837 


1 CA£?f?Of?OA 

159663630 


OA 

-2.389 


97 


16057 


1 C A/^ A 

15960 


1 A'X A 

1.414 


427190088877 


427372467157 


1 OOOTOOOA 

182378280 


1 c 

-1.353 


19417 


43777 


24360 


A C\ AC\ 

(J. 949 


610418426197 


610613084437 


1 A/1£?COO/1A 

194d58z4L) 


1 TCI 

-1.751 


43777 


1 An 1 OCT 

1091257 


1047480 


1 C TA 

1.570 


659829553837 


660044815597 


01 CO£?1 T£?A 

z15zd17dL) 


1 ATA 

-1.079 


3400207 


r> A A COOT 

6005887 


0£? A C £? A 

2605680 


1 1 T/? 

1.176 


660863670277 


661094353807 


OOA£?OOCOA 


A /I OT 

-0.427 


11do4547 


1 A C AC A T 

145zU547 


00 C £3 AAA 


A A /I 

-(J.(j48 


853633486957 


853878823867 


O/ICOOiJAI A 

z4533d91L) 


A C TO 

-0.57o 


o7Uood47 


4UodU717 


O^; A C ATA 


1 AO C 


1089611097007 


1089869218717 


OCOI 01 T1 A 

zoolzl71L) 


A TO£; 


ozy(54oo7 


T /I oonoT 


/I /I c A 


-1.d4d 


1247852774797 


1248116512537 


O^; OTOTT A A 


A A^?0 

-O.yoo 


oy4oO(5z7 


A y1 TC OTOT 

y47oz7z7 


C Oi? AAA 


1 OA 


1475007144967 


1475318162947 


01 1 A1 TAOA 

t5llU1798U 


A /I £? 

0.z4d 


94752727 


1 1 OT1 AOTT 

112710877 


1 TA C 1 C A 

17958150 


TTO 

3.778 


1914335271127 


1914657823357 


000c COOOA 


A 1 TA 

-0.170 


381674467 


/I AO^OATCT 

403629757 


1 A C C A A 

zl955z9U 


1 C 0£? 

1.526 


1953892356667 


1954234803877 


A A A TO 1 A 

34z447zlU 


A A 

0.436 


1569747997 


1 CnOf^COCAT 

1593658597 


00 A 1 A£? AA 

23910600 


1 1 /I A 

-1.149 


3428196061177 


3428617938787 


/I 1 OTT*:? 1 A 

421877610 


1 AO C 

1.085 


2U19957337 


OACTO/1 1 AAT 

zU57z41997 


TOO A £iG(~\ 

37z84ddU 


A TO A 

(J. 730 


9367921374937 


9368397372277 


A TCAATO A r\ 

475997340 


A T /I A 

-0.740 


0o9z94/d4/ 


o9^)dl4oo4/ 


/I A 1 AOOAA 

4U19ozL)U 


1 010 


10254799647007 


10255307592697 


C ATA /I C AA 

00/945090 


A OCi? 

-0.z5d 


6797589427 


£?0£;aaotoot 

6860027887 


62438460 


1 00 /I 
1.224 


13786576306957 


13787085608827 


CAAOAI OTA 

509301870 


1 1 C A 

-1.159 


14048370097 


141124o4ol7 


C A A A A C A 

d4L)945zL) 


A C A/^ 

-0.506 


21016714812547 


21017344353277 


r'OAC /1ATOA 

629540730 


A AO /I 

0.084 


23438578897 


OOCA/1T1 01 /IT 

23504713147 


£?£?1 0/10CA 

66134250 


1 COO 

-1.523 


33157788914347 


33158448531067 


r'CA£?1 £?TOA 

659616720 


A 1 A 

-0.819 




9zL79fl1 ZLQR77 


7n'iQnn'?n 
/ uoyuuou 


1 .zzo 


41348577354307 


41349374379487 


70709^1 sn 

tut UZOloU 


u.yoo 


29637700987 


29715350377 


77649390 


-1.038 


72702520226377 


72703333384387 


813158010 


-0.682 


29869155847 


29952516817 


83360970 


-0.556 


89165783669857 


89166606828697 


823158840 


-1.190 


45555183127 


45645253597 


90070470 


-1.064 


122421000846367 


122421855415957 


854569590 


-1.723 


52993564567 


53086708387 


93143820 


-1.202 


139864197232927 


139865086163977 


888931050 


-1.642 


58430706067 


58528934197 


98228130 


-1.063 


147693859139077 


147694869231727 


1010092650 


-0.077 


93378527647 


93495691687 


117164040 


-0.935 


186009633998047 


186010652137897 


1018139850 


-0.755 


97236244657 


97367556817 


131312160 


-0.108 


202607131405027 


202608270995227 


1139590200 


0.603 


240065351077 


240216429907 


151078830 


-1.388 


332396845335547 


332397997564807 


1152229260 


-0.967 


413974098817 


414129003637 


154904820 


-2.566 


424681656944257 


424682861904937 


1204960680 


-1.155 


419322931117 


419481585697 


158654580 


-2.420 


437804272277497 


437805730243237 


1457965740 


1.725 



5.2 The distribution of maximal gaps 

We have just seen that maximal gaps between prime /c-tuples below p grow about as fast as 
alog(p/a). Thus, the curve alog(p/a) (the dotted curve in Figures 1-3) may be regarded as 
a "trend." Now we are going to take a closer look at the distribution of maximal gaps in 
the neighborhood of this "trend" curve. In our analysis, we will also include the case /c = 1, 
record gaps between primes (A005250). For each k = 1, 2, 4, 6, we will make a histogram of 
shifted and scaled (standardized) record gaps: subtract the "trend" alog(p/a) from actual 
gaps, and then divide the result by the "natural unit" a, the expected average gap. This 
way, all record gaps gk{p) are mapped to standardized values gl (shown in Tables 2-4): 

/ N , * 9k{p) -a\og{p/a) n ^ k 

9k{p) gk = , where a = Cfclogp. 
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Record gaps that exceed a\og{p/a) are mapped to standardized values gl > 0, while those 
below a\og{p/a) are mapped to gl < 0. Note that the majority of known record gaps are 
below the dotted curve the Figures 1-3; accordingly, most of the standardized values gl are 
negative. It is also immediately apparent that the histograms and fitting distributions are 
skewed: the right tail is longer and heavier. This skewness is a well-known characteristic 
of extreme value distributions — and it comes as no surprise that a good fit obtained with 
the help of distribution-fitting software [20] is the Gumbel distribution, a common type of 
extreme value distribution (see Appendix). 




Figure 4: The distribution of standardized maximal gaps g^,: histograms and the fitting 
Gumbel distribution PDFs. For k = \ (primes), the histogram shows record gaps below 
4 X 10^®. For k = 2,4,6 (fc-tuples), the histograms show record gaps below 10^^. 

Here is why we can say that the Gumbel distribution is indeed a good fit: 

(1) Based on goodness-of-fit statistics (the Anderson-Darling test as well as the Kolmogorov- 
Smirnov test), one cannot reject the hypothesis that the standardized values gl might be 
values of independent identically distributed random variables with the Gumbel distribution. 

(2) Although a few other distributions could not be rejected either, the Anderson-Darling 
and Kolmogorov-Smirnov goodness-of-fit statistics for the Gumbel distribution are better 
than the respective statistics for any other two-parameter distribution we tried (includ- 
ing normal, uniform, logistic, Laplace, Cauchy, power- law, etc.), and better than for several 
three-parameter distributions (e.g., triangular, error, Beta-PERT, and others). 

An equally good or even marginally better fit is the three-parameter generalized extreme 
value (GEV) distribution, which in fact includes the Gumbel distribution as a special case. 
The shape parameter in the fitted GEV distribution turns out very close to zero; note that 
a GEV distribution with a zero shape parameter is precisely the Gumbel distribution. The 
scale parameter of the fitted Gumbel distribution is close to one. The mode ji* of the fitted 
distribution is negative. Figure H] gives the approximate value of /i* for k = 1,2,4,6; //* is 
the coordinate of the maximum of the distribution PDF (probability density function). 

Note: Now that we have a more precise value of the mode fi*, we can refine the parameter 
b in the Ei estimator: to estimate the mean record gaps, set b to the mean (/i*+7) of the fitted 
Gumbel distribution. (Of course, a lot more data is desirable for meaningful refinement!) 
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6 On maximal gaps between primes 



Let us now apply our model of gaps to maximal gaps between primes (A005250) [27], |21j : 

Maximal prime gaps are about alog{p/a) — ba, with a = logp and 6 ~ 3. 

If all record gaps behave like those in Figure [5] (showing the 75 known record gaps between 
primes p < 4 x 10^^), this would confirm the Cramer and Shanks conjectures: maximal prime 
gaps are smaller than log^p — but smaller only by O (a log a). We also easily see that the 
Cramer and Shanks conjectures are compatible with our estimate of record gaps. Indeed, 
for a = logp and any fixed 6 > 0, we have log^p ~ a(log(p/a) — b) < \og^ p as p — )■ oo. 



1800 




(log p)'^2 



Figure 5: Maximal gaps between consecutive primes (A005250). Plotted (bottom to top): 
expected average gap a = logp, estimators Ei = a\og{p/a) — ba, E2 = a\og{p/a) (dotted), 
£"3 = a logp = log^p, where p is the end-of-gap prime; b = 3. 

Notes: Maier's theorem (1985) [18] states that there are (relatively short) intervals where 
typical gaps between primes are greater than the average (logp) expected from the prime 
number theorem. Based in part on Maier's theorem, Granville [12] adjusted the Cramer 
conjecture and proposed that, as p — t- 00, limsup(G(p)/ log^p) > 2e~'^ = 1.1229... This 
would mean that an infinite subsequence of maximal gaps must lie above the Cramer-Shanks 
upper limit log p, i. e., above the line in Figure |5] — and this hypothetical subsequence (or 
an infinite subset thereof) must approach a line whose slope is about 1.1229 times steeper! 
However, for now, there are no known maximal prime gaps above log^ p. Interestingly, Maier 
himself did not voice serious concerns that the Cramer or Shanks conjecture might be in 
danger because of his theorem; thus, Maier and Pomerance [TH] simply remarked in 1990: 

Cramer conjectured that limsupG(x)/log^x = 1, while Shanks made the stronger con- 
jecture that G{x) ~ log^ X, but we are still a long way from proving these statements. 
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7 Corollaries: Legendre-type conjectures 

Assuming the conjectures of Section 4, one can state (and verify with the aid of a computer) a 
number of interesting corollaries. The following conjectures generalize Legendre's conjecture 
about primes between squares. 

• For each integer n > 0, there is always a prime between and (n + 1)^. (Legendre) 

• For each integer n > 122, there are twin primes between and {n + lf. (A091592) 

• For each integer n > 3113, there is a prime triplet between and {n + 1)^. 

• For each integer n > 719377, there is a prime quadruplet between and (n + 1)^. 

• For each integer n > 15467683, there is a prime quintuplet between and {n + 1)^. 

• There exists a sequence {s^} such that, for each integer n > Sk, there is a prime fc-tuplet 
between and (n + 1)^. (This {sj is OEIS A192870: 0, 122, 3113, 719377, . . .) 

Another family of Legendre-type conjectures for prime fc-tuplets can be obtained by replacing 
squares with cubes, 4th, 5th, and higher powers of n: 

• For each integer n > 0, there are twin primes between and (n + 1)'^. 

• For each integer n > 0, there is a prime triplet between and {n + 1)^. 

• For each integer n > 0, there is a prime quadruplet between and (n + 1)^. 

• For each integer n > 0, there is a prime quintuplet between and {n + 1)^. 

• For each integer n > 6, there is a prime sextuplet between n'^ and (n + 1)^. 
A further generalization is also possible: 

• There is a prime /c-tuplet between and {n + 1)^' for each integer n > no{k, r), where 
no{k, r) is a function of A; > 1 and r > 1. 

To justify the above Legendre-type conjectures, we can assume the fc-tuple conjecture plus 
statement (B) (sect. 4.2) bounding the size of gaps between /c-tuples: gk{p) < Mklog^^^ p. 
We can now use the following elementary argument: Consider a fixed r > 1, and let x 
be a number in the interval between n"^ and (n -|- 1)*^'. Then, for large n, the interval size 
dr = {n + ly — rf rn^~^ will be asymptotic to rx^^'^"^^^: because x ~ n*" and dr ~ rn^~^ 
when n — >■ oo, we have n ~ x^^^ and dr ~ rx^''"^^/'" when x — >■ oo. But any positive power of 
X grows faster than any positive power of logx when x oo. So x^"^"^^^^ must grow faster 
than log*^"*"^ X. Therefore, the intervals [n^, {n + ly] — whose sizes are about rx'-''^^-'/^ — 
will eventually become much larger than the largest gaps between prime fc-tuples containing 
primes p ^ x. For smaller n, a computer check finishes the job. 

However, this is not a proof, we have relied on unproven assumptions. As Hardy and 
Wright pointed out in 1938 (referring to the infinitude of twin primes and prime triplets). 

Such conjectures, with larger sets of primes, can be multiplied, but their proof or 
disproof is at present beyond the resources of mathematics. [151 p. 6] 

Many years have passed, yet conjectures like these remain exceedingly difficult to prove. 
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8 Appendix: a note on statistics of extremes 



In this appendix we use extreme value statistics to derive a simple formula expressing the 
expected maximal interval between rare random events in terms of the average interval: 



where a is the average interval between the rare events, T is the total observation time 
or length, as applicable, and E{max interval) stands for the mathematical expectation of 
the maximal interval. The formula holds for random events occurring at exponentially 
distributed (real- valued) intervals, as well as for events occurring at geometrically distributed 
discrete (integer-valued) intervals. (For more information on extreme value distributions of 
random sequences see Gumbel's classical book [13]; for extreme value distributions of discrete 
random sequences, such as head runs in coin toss sequences, see also the papers of Schilling 
[25] or Gordon, Schilling, and Waterman [11] and further references therein.) 

8.1 Two problems about random events 

For illustration purposes, we will use two problems: 

Problem A. Consider a non-stop toll bridge with very light traffic. Let P > 1/2 be the 
probability that no car crosses the toll line during a one-second interval, and q = 1 — P the 
probability to see a car at the toll line during any given second. Suppose we observe the 
bridge for a total of T seconds, where T is large, while P is constant. 

Problem B. Consider a biased coin with a probability of heads P > 1/2 (and the probability 
of tails q = 1 — P). We toss the coin a total of T times, where T is large. 

In both problems, answer the following questions about the rare events (cars or tails): 

(1) What is the expected total number of rare events in the observation series of length T? 

(2) What is the expected average interval a between events (i.e., between cars/tails)? 

(3) What is the expected maximal interval between events, as a function of a? 

Notice that the first two questions are much easier than the third. Here are the easy answers: 

(1) Because the probability of the event is q at any given second/toss, we expect a total of 
nq events after n seconds/tosses, and a total of Tq events at the end of the entire observation 
series of length T. 

(2) To estimate the expected average interval a between events, we divide the total length 
T of our observation series by the expected total number of events Tq. So a reasonable 
estimat^ of the expected average interval between events is a ~ T /{Tq) = 1/q. 

^ For a small q, the estimate a ~ l/? is quite accurate: its error is only 0{1). To prove this, we can 
use specific distributions of intervals between events. Thus, if in Problem A the intervals between cars are 
distributed exponentially (CDF 1 — P* = 1 — e"*/"), then the mean interval is a = 1/ log(l/P) = + 0(1). 
If in Problem B the observed runs of heads are distributed geometrically (CDF 1 — P''+^), then the mean 
run of heads is P/q = 1/(7 + 0(1)- 
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(3) Quite obviously, we can predict that the expected maximal interval is less than T, but 
not less than a: 

a < £'(max interval) < T. 

The expected maximal interval will likely depend on both a and T: 

£'(max interval) = f{a,T). 

It is also reasonable to expect that f{a,T) should be an increasing function of both argu- 
ments, a and T. Can we say anything more specific about the expected maximal interval? 

8.2 An estimate of the most probable maximal interval 

In both problems A and B we will assume that 1 ^ a ^ T — or, in plain English: 

• the events are rare (1 ^ a), and 

• our observations continue for long enough to see many events (a ^ T). 

In Problem A, to estimate the most probable maximal interval between cars we proceed as 
follows: After n seconds of observations, we would have seen about nq cars, hence about 
nq intervals between cars. The intervals are independent of each other and real-valued. A 
known good model for the distribution of these intervals is the exponential distribution that 
has the cumulative distribution function (CDF) 1 — P*: 

with probability P, any given interval between cars is at least 1 second; 

with probability P^, any given interval is at least 2 seconds; 

with probability P^, any given interval is at least 3 seconds;. . . 

with probability P*, any given interval is at least t seconds. 
Thus, after n seconds of observations and about nq earless intervals, we would reasonably 
expect that at least one interval is no shorter than t seconds if we choose t such that 

P* X (nq) > 1. 

Now it is easy to estimate the most probable maximal interval tmax^ 

ptmax ^ l/{nq) 

(l/P)*"^"" ^ nq 
~ logy p{nq). 

In Problem B we can estimate the longest run of heads Rn after n coin tosses reasoning 
very similarly. One notable difference is that now the head runs are discrete (have integer 
lengths). Accordingly, they are modeled using the geometric distribution. Schilling (23] has 
this estimate for the longest run of heads after n tosses, given the heads probability P: 

Rn ~ log^/ p{nq). 

In both problems, the estimates for the most probable maximal interval (as a function of P 
and n) have the same form logi^ p{nq). Therefore, it is reasonable to expect that the answers 
to our original question (3) in both problems A and B will also be the same or similar 
functions of the average interval a, even though the problems are modeled using different 
distributions of intervals. We will soon see that this indeed is the case. 
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8.3 If random events are rare... 

If the events (cars in Problem A, or tails in Problem B) are rare, then P is close to 1, and q 
is close to 0. Using the Taylor series expansion of log(l/(l — q)), we can write: 



For a long series of observations, with the total length or duration n = T [e.g. T tosses 
of a biased coin, or T seconds of observing the bridge), the estimate for the most probable 
maximal interval becomes alog(T/a). 

8.4 Expected maximal intervals 

The specific formulas for expected maximal intervals between rare events depend on the 
nature of events in the problem (whether the initial distribution of intervals is exponential 
or geometric). However, as T — )■ oo, in the formulas for both cases the highest-order term 
turns out to be the same: a log(T/a), which was precisely our estimate for the most probable 
maximal interval. 

(A) Exponential initial distribution. Fisher and Tippett j^, Gnedenko |10j, Gumbel 
[T3] and other authors showed that, for initial distributions of exponential type (including, as 
a special case, the exponential distribution) the limiting distribution of maximal terms in a 
random sequence is the double exponential distribution — often called the Gumbel distribu- 
tion. In particular, if intervals between cars in Problem A have exponential distribution with 
CDF 1 — P* = 1 — e~*/", then the distribution of maximal intervals has these characteristic^: 

A^-event CDF: (1 - e"*/")^ = (1 - ^e"^^"''^^/'')^ (distribution for N ^Tq events). 

Limiting CDF: exp(-e"(*"'')/'') (Gumbel distribution) [131 p. 157], 

Scale = a = l/log(l/P) (equal to the expected average interval). 

Mode = fi = fi]\f = alogN ^ alog(T/a) ~ log;^/p(Tg), 

Median = /i — a log log 2 ^ a log(T/a) + 0.3665a, 

Mean = /i + 7a ^ a log(T/a) + 0.5772a, 

^Instead of the scale parameter a, Gumbel |131 p. 157] uses the parameter a — 1/a. The mode (most 
probable value, also called the location parameter) in the A'^-event extreme-value distribution resulting from 
an exponential initial distribution is equal to the characteristic extreme alogN [13( p. 114]. The shape of 
the A^-event extreme- value distribution approaches that of the limiting distribution as — > 00. 
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where 7 = 0.5772... is the Euler-Mascheroni constant. The mean value of observed maximal 
intervals in Problem A will converge almost surely to the mean /i + 07 of the Gumbel 
distribution, therefore: 

(max interval) ^ log^^ p(Tq) + 'ya ~ a\og(T / a) + •ya = a\og{T/a) + 0{a). 

Historical notes: In 1928 Fisher and Tippett [7] described three types of limiting extreme- 
value distributions and showed that the double exponential (Gumbel) distribution is the 
limiting extreme-value distribution for a certain wide class of random sequences. They also 
computed, among other parameters, the mean-to-mode distance in the double exponential 
distribution [3, p. 186]; it is this result that allows one to conclude that the mean is 11 + ay if 
the mode is /x. Gnedenko (1943) [10] rigorously proved the necessary and sufficient conditions 
for an initial distribution to be in the domain of attraction of a given type of limiting 
distribution. 

(B) Geometric initial distribution. Surprisingly, in this case the limiting extreme-value 
distribution does not exist |25j, p. 203], [HI p. 280]. For the longest run of heads Rn in a series 
of n tosses of a biased coin, with the probability of heads P, we have 

7 1 

E{Rn) = logi /p(ng) + - — 7—77-- - - + smaller terms [23 P- 202], 

log(l/P) 2 

where the first term is the same as in Problem A (up to a substitution n = T). The sum of 
the other terms is 0(a) when P is close to 1; so, again, we have 

E{Rn) = alog{n/a) + 0(a). 



8.5 Standard deviation of extremes 

As above, the specific formula for standard deviation (SD) in distributions of maximal inter- 
vals between events depends on the nature of the problem (whether the initial distribution 
of intervals is exponential or geometric). Still, in both cases SD ^ 7ra/v^ = 0(a). 

(A) Exponential initial distribution. Here the limiting distribution of maximal intervals 
is the Gumbel distribution with the scale a = l/log(l/P), therefore the SD of maximal 
intervals must be very close to the SD of the Gumbel distribution: 

SD(max interval) ^ ^ = 0(a) [HI p. 116, 174]. 
v6 

(B) Geometric initial distribution. For the longest run of heads i?„ in a series of n 
tosses of a biased coin, the variance is 

TT^ 1 

Var Rn = — — 2/. /r^N +77: + smaller terms [25l p. 202] , 
6 log (1/P) 12 

where the first term is O(a^), while the sum of the other terms is much smaller than the first 
term. (Again, recall that for average intervals a between rare events — in this case, between 
tails — we have a 1/ log(l/P).) Therefore, the standard deviation is 

SD Rn = x/Var Rn = —p^ h a small term ^ —= = 0(a). 

^ y61og(l/P) VQ ^ ' 
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8.6 A shortcut to the answer 

There is a simple way to "guesstimate" the answer alog{T/a)+0{a). If a is the average inter- 
val between events, then the most probable maximal interval is about alog(T/a) (sect. 8.3). 
We can now simply use the fact that the width of the extreme value distribution is 0{a). 
(Imagine what happens if the rare event's probability q is reduced by 50%. This change in 
q would have about the same effect as if every interval became twice as large: then average 
and maximal intervals would also become twice as large, and the extreme value distribution 
would be twice as wide. This immediately implies that the extreme value distribution is 0(a) 
wide.) But then the true value of the expected maximal interval cannot be any farther than 
0{a) from our estimate alog(T/a); so the expected maximal interval is alog(T/a) + 0(a). 

8.7 Summary 

We have considered maximal intervals between random events in two common situations: 

• rare events occurring at exponentially distributed inteTvals (Problem A); 

• discrete rare events at geometrically distributed mteivals (Problem B). 

These two situations are somewhat different: in the former case maximal intervals have a 
limiting distribution (the Gumbel distribution), while in the latter case no limiting distribu- 
tion exists (here the Gumbel distribution is simply a decent approximation). Nevertheless, 
in both cases the expected maximal interval between events is 

£^(max interval) = alog(T/a) -|- 7a + lower-order terms = alog(T/a) + 0(a), 

where a is the average interval between events, T is the total observation time or length, and 
the lower-order terms depend on the initial distribution. 

As we have seen in Sections 4-6, a remarkably similar heuristic formula a log(x/a) — ba, with 
an empirical term —ba replacing the "theoretical" 7a, satisfactorily describes the following: 

• record gaps between primes below x (a = logx, 6 ~ 3; A005250) 

• record gaps between twin primes below x {a = 0.75739 log^x, 6^1; Al 13274) and, 
more generally, 

• record gaps between prime /c-tuples (a = Ofclog'^x, 6 ~ 2/fc, where Ck is reciprocal to 
the Hardy-Littlewood constant for the particular A;-tuple). 
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